Search CORE

126 research outputs found

Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent

Author: Allen-Zhu Zeyuan
Orecchia Lorenzo
Publication venue
Publication date: 07/11/2016
Field of study

First-order methods play a central role in large-scale machine learning. Even though many variations exist, each suited to a particular problem, almost all such methods fundamentally rely on two types of algorithmic steps: gradient descent, which yields primal progress, and mirror descent, which yields dual progress. We observe that the performances of gradient and mirror descent are complementary, so that faster algorithms can be designed by LINEARLY COUPLING the two. We show how to reconstruct Nesterov's accelerated gradient methods using linear coupling, which gives a cleaner interpretation than Nesterov's original proofs. We also discuss the power of linear coupling by extending it to many other settings that Nesterov's methods cannot apply to.Comment: A new section added; polished writin

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

Dagstuhl Research Online Publication Server

Variance Reduction for Faster Non-Convex Optimization

Author: Allen-Zhu Zeyuan
Hazan Elad
Publication venue
Publication date: 01/01/2016
Field of study

We consider the fundamental problem in non-convex optimization of efficiently reaching a stationary point. In contrast to the convex case, in the long history of this basic problem, the only known theoretical results on first-order non-convex optimization remain to be full gradient descent that converges in

O(1/\varepsilon)

iterations for smooth objectives, and stochastic gradient descent that converges in

O(1/\varepsilon^2)

iterations for objectives that are sum of smooth functions. We provide the first improvement in this line of research. Our result is based on the variance reduction trick recently introduced to convex optimization, as well as a brand new analysis of variance reduction that is suitable for non-convex optimization. For objectives that are sum of smooth functions, our first-order minibatch stochastic method converges with an

O(1/\varepsilon)

rate, and is faster than full gradient descent by

\Omega(n^{1/3})

. We demonstrate the effectiveness of our methods on empirical risk minimizations with non-convex loss functions and training neural nets.Comment: polished writin

arXiv.org e-Print Archive

Princeton University Open Access Repository